# PRESCOTT: <u>Preset-based Cross-Point Architecture</u> for Spin-Orbit-<u>Torque Magnetic Random Access Memory</u>

Liang Chang<sup>1,2</sup>, Zhaohao Wang<sup>1</sup>, Alvin Oliver Glova<sup>2</sup>, Jishen Zhao<sup>3</sup>, Youguang Zhang<sup>1</sup>, Yuan Xie<sup>2</sup> and Weisheng Zhao<sup>1</sup>

<sup>1</sup>Fert Beijing Research Institute, BDBC, School of Electronic and Information Engineering, Beihang University, Beijing, China

<sup>2</sup>University of California, Santa Barbara, CA, USA

<sup>3</sup>University of California, Santa Cruz, CA, USA

<sup>1</sup>{liang.chang, zhaohao.wang, zyg, weisheng.zhao}@buaa.edu.cn

<sup>2</sup>{aomglova,yuanxie}@ece.ucsb.edu <sup>3</sup>{jishen.zhao}@ucsc.edu

Abstract-Due to nearly zero leakage power consumption, non-volatile magnetoresistive random access memory (MRAM) is becoming one of the promising candidates for replacing conventional volatile memories (e.g. SRAM and DRAM). In particular, emerging spin-orbit torque (SOT) MRAM is considered to outperform spin-transfer torque (STT) MRAM due to its fast switching, separate read/write paths, and lower energy dissipation. However, the SOT-MRAM technology is still in its infancy; one key design challenge is that the control of SOT-MRAM, which involves three terminals, is more complicated compared with STT-MRAM. In this paper, we propose a novel MRAM write scheme called PRESCOTT1, where the "1" and "0" data values can be written into memory cells through the SOT and STT, respectively. As a result, the write current is unidirectional rather than bi-directional, which addresses the control complexity. Using this unidirectional write scheme, we design a PreSET-based cross-point (CP) MRAM to improve programing speed, write energy dissipation and storage density compared to conventional MRAM. Circuit simulation results demonstrate that our PreSET-based CP MRAM can achieve around 67.14% average write energy reduction and 50.86% improvement in programming speed, compared with CP STT-MRAM.

Index Terms—MRAM, Spin Orbit Torque, Memory Cell, Cross-Point

#### I. INTRODUCTION

With the advantage of nearly zero leakage power consumption, the non-volatile memories (NVMs) such as magnetoresistive RAM (MRAM), phase change RAM (PCRAM), and resistive RAM (ReRAM), have emerged as potential alternatives to conventional memories including cache and main memory [1] [2]. To date, NVMs have been used to build NV-processor architecture [3] [4]. Among these NVMs, spin-transfer torque MRAM (STT-MRAM) is considered as the most promising universal memory candidate thanks to its high density, fast operation, low power consumption, and

<sup>1</sup>In urban dictionary, *Prescott* is a general term for a man who has amazingly good looks and a great personality. We envision our proposed techniques could potentially make the emerging SOT-MRAM to have amazingly better characteristics than the conventional SOT-MRAM memory cell design.



Fig. 1. Two types of SOT-MTJ. (a) In-plane-SOT MTJ (i-SOT-MTJ), where only a Spin Hall Effect (SHE) write current flowing through the Heavy Metal (HM) switches the magnetization of the Free Layer (FL). (b) Perpendicular Anisotropy SOT MTJ (p-SOT-MTJ), where an additional magnetic field ( $B_{ext}$ ) is required to achieve the deterministic switching together with the SHE write current.

almost infinite endurance. The interest in STT-MRAM has increased during the past few years, with industry prototypes and early demonstrators already available in the market [1] [5]. Nevertheless, the application potential of STT-MRAM is limited by some drawbacks including asymmetric write operation, high write latency and energy overhead.

Recently, SOT-MRAM has been proposed to overcome the drawbacks of STT-MRAM [6] [7]. Generally, the storage element of SOT-MRAM consists of a magnetic tunnel junction (MTJ) above a heavy metal (HM) strip, as shown in Fig. 1. The key structure of the MTJ is a tunnel barrier sandwiched between a ferromagnetic reference layer (RL) and a free layer (FL) contacting to the HM. The relative magnetization directions of FL and RL can be configured to parallel (P) or anti-parallel (AP) state, resulting in low or high tunnel resistance, respectively. A write current flowing through the HM can generate SOT to switch the FL magnetization with high speed and low power. The origin of SOT is spin Hall effect (SHE) [8] or Rashba effect [9], both of which are caused by spin orbit coupling. The relative ratio of SHE to Rashba effect is dependent on the device structure, fabrication process, material type, etc. Here, we assume that the magnetization switching of SOT-MRAM is dominated by the SHE.



Fig. 2. The typical 1T1MTJ bit cell for STT-MRAM and 2T1MTJ bit cell for SOT-MRAM. The  $I_{SHE}$  represents the write current passing the HM and the  $I_{STT}$  is the current flowing through MTJ. Here, illustrates the conventional write scheme based on bi-directional current for (a) conventional STT-MRAM, (b) conventional SOT-MRAM, while (c) shows our proposed unidirectional write scheme (UWS).

The principle of SHE is illustrated in Fig. 1. When a charge current passes the HM, the strong spin orbit coupling causes the accumulation of electrons with the opposite spin polarization directions on both sides. As a result, a pure spin current is injected into the FL of the MTJ and induce a spin torque (i.e. SOT) exerted the magnetization. The behaviors of SOTdriven magnetization dynamics are dependent on the types of magnetic anisotropy in the MTJ. The in-plane-anisotropy MTJ (i-MTJ) can be deterministically switched by the SOT if the anisotropy axis of the FL is collinear to the polarization direction of SHE-induced spin current, as shown in Fig. 1 (a). For the perpendicular-anisotropy MTJ (p-MTJ) shown in Fig. 1 (b), the polarization orientation of injected spin current is vertical to the anisotropy axis of the FL so that a larger spin torque can be generated to obtain faster magnetization reversal than the case of the conventional STT. However, an additional magnetic field is required to achieve deterministic switching. The presence of magnetic field limits the application of the SOT p-MTJ in NVMs and logic circuits. Some solutions have been proposed to avoid the use of the magnetic field. For example, the structural mirror asymmetry is developed to replace the magnetic field, but it requires the fabrication of the wedge film and cannot be applied to the MTJ [10]. Spin-Hall-assisted STT (SHA-STT) is proposed to eliminate the incubation delay of the STT to accelerate the magnetization switching in the absence of the magnetic field [11]. Very recently, it was demonstrated that the strong SHE can be induced in the antiferromagnetic materials such as PtMn and IrMn [12] [13] [14] [15]. Afterwards the experiments verify the feasibility of generating simultaneously the SOT and the exchange bias in antiferromagnet/ferromagnet (AFM/FM). In this way, the external magnetic field is replaced with the exchange bias and thus the deterministic switching of perpendicular magnetization can be achieved. The exchange bias shows the better scalability and lower cell-to-cell disturbance compared with the conventional Oersted field. The architecture presented in this work is expected to be implemented with AFM-SOT p-MTJ.

The write operation of both conventional STT-MTJ or SOT-MTJ requires a bidirectional current as shown in Fig. 2(a) and (b), respectively. In this paper, we propose to write a

three-terminal p-MTJ with two unidirectional write currents: STT current for P state writing and SOT current for AP state writing, as shown in 2(c). Based on our unidirectional write scheme (UWS), the contributions of this paper are listed as follows:

- Generally, the STT suffers from write asymmetry issue.
  The write operation of AP state consumes more energy
  and larger delay than that of P state due to lower spin
  torque efficiency. Our proposed UWS achieves the write
  operation of the AP state with high-efficiency SOT,
  improving the write performance.
- Cross-point (CP) architecture can provide ultrahigh density for MRAM by sharing the driving transistors. However, the operation of the SOT-MTJ based CP architecture is significantly influenced by the sneak current, as three terminals are involved for each cell. Our proposed UWS scheme allows to implement Preset-based CP MRAM with the three-terminal MTJ. Simulation results demonstrate that the proposed CP MRAM achieves higher speed and lower energy than the STT-based one with small area overhead.

The remainder of this paper is organized as follows. In section II, we review the design of the SOT-MRAM. Then, the details of UWS and preset operation are presented in section III. Next, we introduce the architecture and working principle of preset-based CP MRAM with three-terminal MTJ in section IV. Afterwards, the simulation results are demonstrated to verify the function of the proposed CP MRAM and to evaluate the performance in section V. Finally, we conclude this work in section VI.

#### II. OVERVIEW OF SOT-MTJ BASED MEMORY

Some electrical models of SOT-MTJ have been developed based on experimental results and spintronic theories [16] [17] [11] [18]. Those models can be used to evaluate the performance of SOT-MRAM before fabricating a real device and chip via the circuit simulation. A number of SOT-MTJ based memories have been developed to show the advantages in terms of low latency and low energy consumption. Using SHE-assisted STT-MTJ, E. Eken et al. designed high-density MRAM by sharing the SHE control transistors. Then a



Fig. 3. (a) Preset and data-in operations of array structure. (b) Schematic of pre-charged sensing amplifier (PCSA). Glossary: Bit-Line (BL), Source-Line (SL), Selected-SL (SSL), Word-Line (WL), Reference Branch (Ref), Clock Signal (CLK), Read Enable (RE).

disturbance-free MRAM was also proposed based on the same device [19]. Y. Seo et al. proposed an area-efficient SOT-RAM by replacing one access transistor with a Schottky diode [20], which has achieved 30% and 50% reduction in bit-cell area compared to the standard STT-MRAM and SOT-MRAM. F. Oboril et al., L. Chang et al. and Y. Seo et al. evaluated the advantage of the SOT-MRAM as multi-level cache [6] [21] [22] with SOT-p-MTJ, SHE-assisted STT-p-MTJ and SOTi-MTJ, respectively. Those works concluded that the SOT-MRAM outperforms STT-MRAM in terms of write speed and power consumption for the application of cache replacement. X. Wang et al. built SOT-CAM architecture with multi-bit match-in-space and single-bit nondestructive self-reference functionality, which significant reduced the power consumption owing to low leakage current and small stand-on power [23]. A series of works were focused on SOT-MTJ based logic circuits as well. K. Kwon et al., M. Kazemi et al., K. Jabeur et al. and Z. Wang et al. proposed high speed and low power magnetic flip-flops based on SOT-MTJ [24] [17] [25] [26]. E. Deng et al., Y. Zhang et al. and M. Chen et al. extended the SOT-MTJ to the design of full-adder architecture and Dominostyle spin logic [27] [28] [29]. These applications indicated that the SOT-MTJ also can be used to deign the high-speed logic replacing the CMOS-based logic with high speed and low power consumption.

Those works considered the three-terminal SOT-MTJ as the storage elements in conventional circuits and architectures, whereas we aim to design high density SOT-MRAM with CP architecture. For this purpose, we raise a novel write scheme for three-terminal MTJ and develop preset operation to improve the write speed, reduce the energy dissipation and mitigate the sneak current of CP MRAM.

# III. PRESET-BASED MRAM WITH UNIDIRECTIONAL WRITE SCHEME

In this section, we present a novel unidirectional write scheme (UWS) for the three-terminal SOT-MTJ. The detailed

operations of the bit-cell are demonstrated as well.

#### A. Unidirectional Write Scheme

The write operation of UWS SOT-MTJ is illustrated in Fig. 2. Compared with the conventional SOT-MTJ, one of the two write currents flow through the MTJ rather than through the HM. For achieving the energy-efficient operation, several rules are set for the UWS SOT-MTJ. First, state "P" of SOT-MTJ is written by STT current flowing through the MTJ (See  $I_{STT}$  in Fig. 2 (c)) whereas state "AP" by SHE current passing the HM (see  $I_{SHE}$  in Fig. 2 (c)). This rule mitigates the asymmetry of STT switching, as writing "P" state consumes lower energy and spends smaller delay than writing "AP" state in STT-MTJ. In addition, both  $I_{STT}$  and  $I_{SHE}$  are unidirectional, simplifying the control of the SOT-MTJ. Second, to read this SOT-MTJ, one small read current should flow through MTJ and HM (Half or entire).

For the conventional MTJ cell switched by bi-directional current, the voltage-drop problem of NMOS transistor limits the write current, especially it becomes more serious with the scaling CMOS technology [30]. Here, based on our proposed UWS, a typical 2T-1MTJ memory cell is shown in Fig. 2 (c). In this memory cell, thanks to the UWS, the NMOS transistors provide better driving capability and avoid the voltage-drop problem.

#### B. Design of Preset-based MRAM

In the conventional SOT-MRAM, two access transistors are required for each bit cell for the read and write paths. Our proposed UWS can be used to design preset-based MRAM in which each bit cell consists of only one access transistor and one SOT-MTJ. The detailed operation is described as follows.

Write operation: Fig. 3 (a) shows a typical 4-cell structure of the proposed MRAM. The write operation is achieved by two stages: the preset and data-in stages. During the preset stage, the word line (WL) is activated to enable the selected row. The Selected Source Line (SSL) is connected to VDD and Source Line (SL) is grounded, the current flows through the HM in order to *PRESET* all the bit cells within the selected row to "AP" state. During the data-in stage, according to the programed data, same bit cells need to be written into "P" state. For that, SL and SSL are connected to VDD. For the programed bit cells, Bit Lines (BLs) are set to ground states. Two currents flow through the half of HM and the MTJ to write the "P" states into the MTJs. For the other bit cells, BLs are set to float states. No current passes MTJs and the state are left unchanged. In this stage, the voltage between the HM is balanced to avoid the possible current flowing through the HM.

**Read operation:** For the read operation, Sense Amplifiers (SAs) are connected to the BL, and both SL and SSL are set to ground. In this case, two read currents flow through the memory cell and reference cell branches. The storage data is read out by the comparison of two read currents. Fig. 3 shows the possible read scheme implemented by the pre-charged based sense amplifier (PCSA) [31].



Fig. 4. The architecture of preset-based cross-point SOT-RAM, including the following key components: memory array, reference, control transistors and SAs. SL and SSL should be connected to V or V/2. Glossary: Read Enable odd line (RE\_O), RE even line (RE\_E), Source Control Transistor (SCT), Ground CT (GCT), Data-In CT (DICT),source up-control signals (U0-U3), source down-control signals (D0-D3), ground up-control signals (UC0-UC3), ground down-control signals (DC0-DC3), Data-In control (DI0-DI3), C0-C3: memory cells, O0-O3: outputs.

## IV. PRESET-BASED CROSS POINT MRAM

Beyond the above 1T1SOT-MTJ MRAM, in this section we develop a cross-point (CP) MRAM wherein access transistors are not necessary in memory cells. We present the architecture and operation of the proposed CP MRAM, then highlight the mitigation of sneak current.

#### A. Overview of Preset-Based Cross-Point MRAM

Figure 4 shows the architecture of preset-based CP MRAM (PreSET-CP MRAM), which is composed of four parts: memory array, reference array, control transistors and SAs. In the memory array, MTJs are arranged into a criss-crossing diagonal pattern in order to balance the read branches. For controlling directions of the write currents, the drain terminals of ground-control transistors (GCT) and data-in-control transistors (DICT) are always grounded, while the source terminals of source-control transistors (SCT) are connected to *VDD* or *VDD/2*, respectively. In the reference array, Read-Enable-odd (RE\_O) and Read-Enable-even (RE\_E) are used to control the read operation of odd row and even row, respectively. Here, SA is designed based on the PCSA structure shown in 3 (b).



Fig. 5. The operation of PreSET-based Cross Point architecture (a) Preset operation used to SET the entire word. (b) and (c) The data-in operation to write "P" state depending on the input data pattern. (d) The burst read operation.

#### B. The Operation of PreSET-CP MRAM

The architecture of the proposed PreSET-CP MRAM enables preset-based write operation through the SOT. In addition, based on criss-crossing diagonal pattern, the burst read operation is employed for mitigating the sneak current of the PreSET-CP MRAM. The detailed operations are shown as follows.

Preset-Based write operation for CP MRAM: Based on the proposed UWS for three-terminal SOT-MTJ, the presetbased write operation is designed for our CP MRAM. The preset stage of a word in CP architecture is illustrated by Fig. 5 (a). The D0 and UC0 are activated so that the currents flow through the HM of each memory cell. The preset value ("AP" state) is programmed to the memory cell by the SOT. The data-in stage is shown in Fig. 5 (b) and (c). The sourcecontrol transistors U0 and D0 are activated to provide two equal voltages for SL and SSL, in this case no net current flows through the HM and SOT is avoided. At the same time, in Fig. 5 (b) both DIO and DI1 are grounded, and two currents flow through the half HM and MTJ in order that the "P" state is programed to the two memory cells, whereas in Fig. 5 (c) only DIO is activated so that the "P" state is programed to one cell while the other cell is left unchanged. The data-in stage suffers from the sneak current since there are float points in the array structure. Our solution is to set the SL and SSL of unselected word to V/2.

**Burst Read operation for CP MRAM:** For the read operation, a burst read scheme is designed for CP MRAM, as shown in Fig. 5 (d). Both the source-transistors controlled by *U0* and *D0*, and data-in transistors controlled by DI0 and DI1 are deactivated, in the meantime, the transistors controlled by *UC0* and *DC0* are activated. The sense currents flow through the MTJ and two half-HM. For read operation of the odd

| TABLE I                          |
|----------------------------------|
| SOT-MTI COMPACT MODEL PARAMETERS |

| Parameter | Description                                    | PMA MTJ SOT                         |
|-----------|------------------------------------------------|-------------------------------------|
| α         | Gilbert damping constant                       | 0.05                                |
| TMR0      | TMR ratio under zero biased                    | 1.2                                 |
| $t_F$     | Free layer thickness (nm)                      | 0.7                                 |
| η         | Spin hall angle                                | 0.3                                 |
| Area      | MTJ area (nm <sup>2</sup> )                    | $\frac{\pi}{4} \times 60 \times 60$ |
| P         | Spin polarization                              | 0.62                                |
| l,w,d     | Heavy metal dimensions (nm)                    | 60, 70, 3                           |
| $M_s$     | Saturation magnetization (Am <sup>-1</sup> )   | $1 \times 10^{6}$                   |
| $H_{eff}$ | Effective anisotropy field (Am <sup>-1</sup> ) | 200060                              |
| $B_{ext}$ | External magnetic field (mT)                   | 48                                  |
| Е         | Thermal stability factor                       | 60.09                               |

and even word are enabled by the *RE\_O* and the *RE\_E*, respectively (see Fig. 4). In read stage, the SAs are firstly precharged and then burst discharged via reference and memory cells. There are several advantages for this arrangement: First, the leakage current on the transistors of two branches of each SA are balanced. Second, a simple reference array structure can be achieved. Third, the burst read scheme provides a parallel sensing for CP MRAM, mitigating the sneak current.

#### C. Consideration for Mitigating the Sneak Current

The sneak current is a key issue in the CP architecture since it increases the energy dissipation and causes read and write errors. Here, we emphasize the approach of mitigating the sneak current to show the benefits of our proposed UWS for PreSET-CP MRAM architecture.

For the selected word, the sneak current is absolutely avoided during the preset-based stage due to parallel writing for the entire row and isolation from other words. During the data-in stage, the small sneak current is caused by the float point. However, it is relieved by the fact that no net current pass through the HM. The parallel sensing reduces the sneak current significantly and mitigates influence of the leakage current. For the unselected word, the V/2 compensation can be used to decrease the sneak current efficiently in the CP MRAM as well. Furthermore, the size of the data-in control transistor can be set to be relative small in order to limit the impact of parasitic capacitance. Therefore, lower energy dissipation and higher stability of CP MRAM architecture can be achieved based on our proposed preset-operation.

# V. CIRCUIT SIMULATION AND ANALYSES

In this section, the electrical model for UWS SOT-p-MTJ is developed to verify the write and read operations of the UWS based memory cell and preset-based CP MRAM (PreSET-CP MRAM). Afterwards, the cell area, the energy dissipation and the write speed of the proposed CP MRAM are analyzed based on simulation results.

# A. Electrical SOT-MTJ Model and Simulation Configuration

To reveal the behavior of our proposed UWS SOT-MTJ, we program an electrical-compact model using Verilog-A. The magnetization dynamics in the FL of the p-SOT-MTJ is described via solving a modified Landau-Lifshitz-Gilbert (LLG)



Fig. 6. The simulation results based on our developed electrical model of three-terminal MTJ.  $V_{M1}$  and  $V_{M2}$  are the voltage applied to the gate of M1 and M2, respectively (see Fig. 2).  $m_z$  is the z-component of the normalized magnetization of the FL, voltage "1" and "-1" represent the "P" and "AP" states, respectively.

equation, with both the roles of STT and SHE considered. Fig. 1 (b) shows the structure of the SOT-MTJ model. The parameters of the SOT-p-MTJ are listed in table *I*.

The circuit simulations are performed with CMOS 40nm design kit [32] and the UWS SOT-MTJ electrical model on Cadence platform. We employ the p-STT-MTJ-based CP MRAM (STT-CP) as baseline, in which the MTJ technology is the same as that of PreSET-CP MRAM. Note that, the CP MRAM with conventional SOT-MTJ is not available since the sneak current flows through HM and induces huge disturbances on write or read operation<sup>2</sup>.

We define the "AP" state of SOT-MTJ as logic "0" and the "P" state as "1". We build a  $4\times4$  memory array to validate the function of PreSET-CP MRAM and simulate an  $8\times8$  memory array to analyze the performance of PreSET-CP MRAM. The detailed simulation waveform and numerical results are shown as follows.

## B. Functional Validation for CP MRAM

Based on the memory cell shown in Fig. 2 (c), we verify the operation for our proposed UWS SOT-MTJ. The simulation result of the UWS SOT-MTJ is illustrated in Fig. 6. The UWS SOT-MTJ is switched from the "P" state to "AP" state by activating the  $V_{M1}$  with  $0.8ns\ I_{SHE}$  write pulse. After that, the  $V_{M2}$  is turned on to provide  $I_{STT}$  so that the MTJ is programed to "P" state with around 1.6ns latency.

To validate the function of the PreSET-CP MRAM, a series of operations is executed on a  $4 \times 4$  memory array. The corresponding result is illustrated in Fig. 7. We set the initial value of the selected word to [1111]. Firstly, the burst read operation is verified through the simulation. Next, data [0000] is written during the preset stage and the expected value [1010] is programmed during the data-in stage. Finally, the read operation is performed again to validate the successful

<sup>&</sup>lt;sup>2</sup>There are three possible scenario for the connection among the BL, SL and SSL. Only in our proposed CP architecture can mitigate sneak current thanks to the preset-based operation and parallel sensing scheme which cannot be implemented by another two scenarios.



Fig. 7. The transient simulation of the preset-based CP MRAM. The word to be written is C3-C0 (see Fig. 4); Firstly, [0000] is written into this word during the preset stage. Afterwards, data [1010] is programmed during the data-in stage. The RE is the read-enable signal, CLK is the control signal.  $m_{z3}$ - $m_{z0}$  reflect the magnetization status of C3-C0. O3-O0 are the output results of the SAs.

programing. In the figure, the read enable (RE) signal is used to control the operation mode. ST0-ST3 represent the magnetization states of C0-C3 shown in Fig. 4 and O0-O3 are the output signals of SA.

#### C. Performance Analysis for CP MRAM

For the performance evaluation, we build memory-array architecture which includes decoder, SAs, write driver, and control module, as shown in Fig. 8 (a). To program one word, the word is selected by the decoder and preset by the CTRL module. The write driver (WD) can enable or disable the STT write current depending on the input data during data-in stage. We assume that the size of a word is N and the number of value "1" in the word is n. Two periods are required to complete the programming of the entire word, as demonstrated in Fig. 8 (b). The architecture of STT-CP is similar to [33], except for replacing the i-MTJ with p-MTJ and adding the (V/2) voltage compensation on the word line. For the STT-CP, the data programming is achieved by two periods as well.

The area of the proposed UWS SOT-MTJ is mainly determined by the transistor size since MTJs can be fabricated above CMOS transistors. The estimation of cell area in CP MRAM follows equation (1), where  $A_{SA}$ ,  $A_{DI}$ ,  $A_{ng}$  and  $A_{ps}$  are the areas of SA, data-in transistor, NMOS ground transistor and PMOS source transistor, respectively. The  $N_{BL}$  and  $N_W$  are the number of BLs and words of the memory array, respectively. Here, in order to meet the functional requirements, we set  $A_{SA}=160 {\rm F}^2$ ,  $A_{DI}=5 {\rm F}^2$ ,  $A_{ng}=15 {\rm F}^2$ ,  $A_{ps}=30 {\rm F}^2$ , where F is the feature size of CMOS technology.



Fig. 8. (a) The overall architecture of proposed CP-MRAM. (b) The programming period consists of preset-stage for N bit-cells setting and data-in stage for n bit-cells programming. Glossary: control module (Ctrl), N is the word size and n represents the number of bit-cells to be written into "1".



Fig. 9. The comparison of cell area between STT-CP and PreSET-CP as a function of the number of words. The number of BLs is fixed. As the number of words is increased, the cell area becomes constant.

$$A_{cp} = \frac{\left(\frac{A_{SA}}{2} + A_{DI}\right) \times N_{BL} + 2(A_{ng} + A_{ps} + 2) \times N_{W}}{N_{BL} \times N_{W}}$$
(1)

Figure 9 shows the comparison of the cell area between the PreSET-CP MRAM and STT-CP. The memory cell of STT-CP is larger than PreSET-CP MRAM since STT-MTJ based memory cell requires larger source-transistor for writing "AP" state. With the number of words increasing, the cell area is close to a constant which are around  $13F^2$  and  $11F^2$  for  $A_{STT}$  and  $A_{SOT}$ , respectively.

To analyze the energy dissipation and programing speed per word, we set the word size N=8, and design the write data pattern DPn (n represents the number of "1"). The energy of our proposed SOT-MTJ based CP MRAM includes two parts: preset energy and data-in energy. For data-in stage, the energy dissipation is only caused by the operation of writing "1". Therefore, the total energy is only dependent written on the



Fig. 10. The write energy dissipation and programming delay versus data pattern, validating the advantages of PreSET-CP over STT-CP. Here, the value of horizontal axis represents the number of "1" in the date pattern.

number of written "1". An extreme example is writing DP0 which only consume preset energy. The programming energy can be estimated by (2), where  $E_{word}$ ,  $E_{SET}$ ,  $E_{DI}$ , N and n are energy per word, the preset energy per memory cell (0.1pJ), data-in energy (1.28pJ) per memory cell, the number of bits in one word, the number of data "1" in the word, respectively. <sup>3</sup>

$$E_{word} = E_{SET} \times N + E_{DI} \times n \tag{2}$$

Our proposed PreSET-CP MRAM outperforms the STT-CP counterpart both in energy dissipation and programming delay, as shown in Fig. 10. In the case of writing DP0, STT-CP MRAM consumes significant energy dissipation and incurs huge delay since the writing of "AP" state is very inefficient. In contrast, for the proposed PreSET-CP MRAM the "AP" is written through the more efficient SOT-based preset operation. With the number of "1" increasing, the energy dissipation of PreSET-CP MRAM ( $E_{SOT}$ ) shows a slight increase since more data "1" is needed in data-in stage. However, as less data "0" is written, the energy in STT-CP decreases rapidly. Nevertheless, our proposed CP MRAM is still superior to the STT-CP MRAM as the sneak current is totally avoided during preset stage.

For the energy, there is a small difference between the calculation and simulation of the proposed PreSET-CP MRAM, which is caused by sneak current and wire resistance (see black and red dotted line). For writing DP8, the energy of PreSET-CP MRAM and STT-CP are 2.83pJ and 2.63pJ, respectively. The average energy dissipation is described in equation (3). Here,  $E_{average}$ ,  $P_i$ ,  $E_i$  and N represent the average energy dissipation, probability of programing "1", the energy for various DP and the word size, respectively. Based on Fig. 10, the average writing energy dissipation of the word in STT-CP is about 6.29pJ, whereas it is nearly 2.07pJ in our proposed PreSET-CP MRAM, with around 67.14% energy saved.



Fig. 11. The influence of the asymmetry of STT switching on the energy dissipation.

$$E_{average} = \sum_{i=0}^{N} P_i E_i; with P_i = C(N, i) \frac{1}{2^N}$$
 (3)

For the programming delay, the PreSET-CP MRAM stay at 2ns except for the case of DP0, however, the STT-CP MRAM shows a considerable wide range with the DP. The calculation of the average programming delay can be calculated based on the principle similar to equation (3). The average reduction of programing delay for PreSET-CP MRAM is nearly 50.86% compared with STT-CP MRAM, as the writing of "AP" is achieved by SOT in our proposed PreSET-CP MRAM rather than the inefficient STT in STT-CP MRAM.

#### D. Impact of STT Asymmetry

The performance of STT-MRAM is also influenced by the asymmetry of STT switching between "AP" and "P" states. The extent of asymmetry is evaluated by a factor shown in equation (4) [34], where  $\epsilon$ ,  $\Lambda$  and P are the STT efficiency, asymmetry factor and Spin Polarization, respectively,  $\mathbf{m}$  and  $\mathbf{m}_p$  are the magnetization of the FL and RL, respectively.

$$\epsilon = \frac{P\Lambda^2}{(\Lambda^2 + 1) + (\Lambda^2 - 1)(\mathbf{m} \cdot \mathbf{m_p})}$$
(4)

The results of energy comparison are illustrated in Fig. 11, where  $\Lambda$  are set to 1.1, 1.3 and 1.5. In these three cases, the average energy dissipation of PreSET-CP MRAM are smaller than that in the STT-CP counterpart. The preset stage of our proposed PreSET-CP MRAM is not influenced by the asymmetry factor, resulting in a weak dependence of energy dissipation on the  $\Lambda$ . Therefore, compared with the STT-CP MRAM, the PreSET-CP MRAM is energy-efficient design regardless of the STT asymmetry thanks to the preset-based operation.

# VI. CONCLUSION

We have proposed a novel UWS for the three-terminal MTJ. Based on this scheme, the MTJ is switched to AP

<sup>&</sup>lt;sup>3</sup>These data are obtained from the single-cell simulation.

and *P* states with SHE and STT write currents, respectively. Benefiting from the proposed UWS, we designed a PreSET-based CP MRAM and proved its advantages of storage density, programing speed and write energy dissipation over the conventional STT-CP MRAM. Transient simulation was performed to validate the function and analyzed the performance of the proposed CP MRAM. Based from simulation results, the average energy saving and programming speed were improved by nearly 67.14% and 50.86%, respectively.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. L. Chang, Z. Wang, Y. Zhang and W. Zhao are supported by the projects from National Natural Science Foundation of China under Grant No. 61571023, 61501013 and 61627813, Beijing Municipal of Science and Technology under Grant No. D15110300320000, the Inter-national Collaboration Projects No. 2015DFE12880 and B16001. A. Glova and Y. Xie are supported in part by a NSF 1500848/1719160 and a DOE grant DE-SC0013553. J. Zhao is supported by NSF1652328. L. Chang and Z. Wang contribute equally to this paper. Corresponding authors: Weisheng Zhao and Yuan Xie.

#### REFERENCES

- N. D. Rizzo, D. Houssameddine and J. Janesky et al. A Fully Functional 64 Mb DDR3 ST-MRAM Built on 90 nm CMOS Technology. *IEEE Transactions on Magnetics*, 49(7):4441–4446, 2013.
- [2] S. Kim J. Liang J. P. Reifenberg B. Rajendran M. Asheghi H.-S. Philip Wong, S. Raoux and K. E. Goodson. Phase change memory. *Proceeding of the IEEE*, pages 98(12): 2201–2227., 2010.
- [3] K. Ma, Y. Zheng, S. Li, K. Swaminathan, X. Li, Y. Liu, J. Sampson, Y. Xie and V. Narayanan. Architecture Exploration for Ambient Energy Harvesting Nonvolatile Processors. In *International Symposium on High* Performance Computer Architecture (HPCA). IEEE, Feb 2015.
- [4] Y. Liu, Z. Wang, A. Lee et al. 4.7 A 65nm ReRAM-enabled Vonvolatile Processor with 6x Reduction in Restore Time and 4x Higher Clock Rrequency Using Adaptive Data Retention and Self-Write-Termination Nonvolatile Logic. In *IEEE International Solid-State Circuits Confer*ence (ISSCC), pages 84–86. IEEE, Jan 2016.
- [5] K. Rho, K. Tsuchida, D. Kimet et al. 23.5 A 4Gb LPDDR2 STT-MRAM with Compact 9F2 1T1MTJ Cell and Hierarchical Bitline Architecture. In *IEEE International Solid-State Circuits Conference (ISSCC)*, pages 396–397, Feb 2017.
- [6] F. Oboril, R. Bishnoi, M. Ebrahimi and M. B. Tahoori. Evaluation of Hybrid Memory Technologies Using SOT-MRAM for On-chip Cache Hierarchy. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 34(3):367–380, 2015.
- [7] Y. Kim, X. Fong and K. Roy. Spin-Orbit-Torque-Based Spin-Dice: A True Random-Number Generator. *IEEE Magnetics Letters.*, 6, 2015.
- [8] L. Liu, C. Pai, Y. Li, H. W. Tseng, D. C. Ralph, R. A. Buhrman. Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum. *Science*, 336(6081), 2012.
- [9] I. M. Miron, G. Gaudin, S. Auffret, B. Rodmacq, A. Schuhl, S. Pizzini, J. Vogel and P. Gambardella. Current-driven Spin Torque Induced by the Rashba Effect in a Ferromagnetic Metal Layer. *Nature Materials.*, 9(3):230, 2010.
- [10] G. Yu, P. Upadhyaya and Y. Fan et al. Switching of Perpendicular Magnetization by Spin-Orbit Torques in the Absence of External Magnetic Fields. *Nature nanotechnology*, 9:548, 2014.
- [11] Z. Wang, W. Zhao, E. Deng, J. O. Klein and C. Chappert. Perpendicular-Anisotropy Magnetic Tunnel Junction Switched by Spin-Hall-Assisted Spin-Transfer Torque. *Journal of Physics D: Applied Physics*, 48(6):065001, 2015.

- [12] Y. W. Oh, S. C. Baek, Y. M. Kim et al. Field-Free Switching of Perpendicular Magnetization Through Spin-Orbit Torque in Antiferromagnet/Ferromagnet/Oxide Structures. *Nature nanotechnology*, 11(Jul.):1–8, 2016.
- [13] S. Fukami, C. Zhang, S. DuttaGupta, A. Kurenkov and H. Ohno. Magnetization Switching by Spin Orbit Torque in An Antiferromagnet Ferromagnet Bilayer System. *Nature Materials*, 2016.
- [14] Y. Lau, D. Betto, K. Rode, J. M. D. Coey and P. Stamenov. SpinOrbit Torque Switching Without An External Field Using Interlayer Exchange Coupling. *Nature Nanotechnology*, 11:758–762, 2016.
- [15] A Van Den Brink, G Vermijs, A Solignac, J Koo, J T Kohlhepp, H J M Swagten, and B Koopmans. ARTICLE Field-Free Magnetization Reversal by Spin-hall Effect and Exchange Bias. *Nature Communications*, 7, 2016.
- [16] A. Jaiswal, X. Fong and K. Roy. Comprehensive Scaling Analysis of Current Induced Switching in Magnetic Memories Based on In-Plane and Perpendicular Anisotropies. *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, 6(2):120–133, 2016.
- [17] M. Kazemi, G. E. Rowlands, E. Ipek, R. A. Buhrman and E. G. Friedman. Compact Model for Spin-Orbit Magnetic Tunnel Junctions. *IEEE Transactions on Electron Devices*, 63(2):848–855, 2016.
- [18] K. Jabeur, G. D. Pendina, G. Prenat, L. D. Buda-Prejbeanu and B. Dieny. Compact Modeling of a Magnetic Tunnel Junction Based on Spin Orbit Torque. *IEEE Transaction on Magnetics*, 50(7):1–8, Jul 2014.
- [19] E. Eken, Y. Zhang, B. Yan, W. Wu, H. Li and Y. Chen. Spin-hall Assisted STT-RAM Design and Discussion. *IEEE International Magnetic Conference, INTERMAG* 2015, 1:2–5, 2015.
- [20] Y. Seo, K. W. Kwon and K. Roy. Area-Efficient SOT-MRAM With a Schottky Diode. IEEE Electron Device Letters, 37(8):982–985, 2016.
- [21] L. Chang, Z. Wang, Y. Gao, W. Kang, Y. Zhang, W. Zhao. Evaluation of Spin-Hall-assisted STT-MRAM for Cache Replacement. In *IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)*, pages 73–78, 2016.
- [22] Y. Seo, K. W. Kwon, X. Fong and K. Roy. High Performance and Energy-Efficient On-Chip Cache Using Dual Port (1R/1W) Spin-Orbit Torque MRAM. *IEEE Journal on Emerging and Selected Topics in Circuits and Systems.*, 6(3):293–304, 2016.
- [23] X. Wang, P. Keshtbod, Z. Wang, K. Satoh, B. K. Yen and Y. Huai. Spin-Orbitronics Memory Device with Matching and Self-Reference Functionality. *IEEE Transactions on Magnetics*, 51(11), 2015.
- [24] Z. Wang, W. Zhao, E. Deng, Y. Zhang, and J. O. Klein. Magnetic Non-volatile Flip-Flop with Spin-Hall Assistance. *Physica Status Solidi* (RRL) - Rapid Research Letters, 9(6):375–378, 2015.
- [25] K. W. Kwon, S. H. Choday, Y. Kim, X. Fong, S. P. Park and K. Roy. SHE-NVFF: Spin Hall Effect-Based Nonvolatile Flip-Flop for Power Gating Architecture. *IEEE Electron Device Letters*, 35(4):488–490, 2014.
- [26] K. Jabeur, G. Di Pendina, F. Bernard-Granger and G. Prenat. Spin Orbit Torque Non-Volatile Flip-Flop for High Speed and Low Energy Applications. *IEEE Electron Device Letters*, 35(3):408–410, 2014.
- [27] E. Deng, Z. Wang, J. O. Klein, G. Prenat, B. Dieny and W. Zhao. High-Frequency Low-Power Magnetic Full-Adder Based on Magnetic Tunnel Junction with Spin-Hall Assistance. *IEEE Transactions on Magnetics*, 51(11):1–4, 2015.
- [28] Y. Zhang, B. Yan, W. Wu, H. Li and Y. Chen. Giant Spin Hall Effect ( GSHE) Logic Design for Low Power Application. *Design, Automation* and Test in Europe Conference and Exhibition (DATE), pages 1000– 1005, 2015.
- [29] M. C. Chen, Y. Kim, K. Yogendra and K. Roy. Domino-Style Spin-Orbit Torque-Based Spin Logic. *IEEE Magnetics Letters.*, 6, 2015.
- [30] X. Bi, M. Mao, D. Wang and H. Li. Unleashing the Potential of MLC STT-RAM Caches. In *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, pages 429–436, Nov 2013.
- [31] W. Zhao, C. Chappert, V. Javerliac and J. P. Noziere. High Speed, High Stability and Low Power Sensing Amplifier for MTJ/CMOS Hybrid Logic Circuits. *IEEE Transaction on Magnetics*, 45(10):3784–3787, 2009.
- [32] STMicroelectronics. CMOS40 Design Rule Manual. 2012.
- [33] W. Zhao, S. Chaudhuri, C. Accoto, J. O. Klein, C. Chappert and P. Mazoyer. Cross-Point Architecture for Spin Transfer Torque Magnetic Random Access Memory. *IEEE Transaction on Nanotechnology*., 11(5):907–917, 2012.
- [34] J. Xiao and A. Zangwill. Boltzmann Test of Slonczewski's Theory of Spin-Transfer Torque. *Physical Review B*, 70(17):172405, 2004.